Modelling Out-of-Vocabulary Words for Robust Speech Recognition
نویسنده
چکیده
This thesis concerns the problem of unknown or out-of-vocabulary (OOV) words in continuous speech recognition. Most of today's state-of-the-art speech recognition systems can recognize only words that belong to some predefined finite word vocabulary. When encountering an OOV word, a speech recognizer erroneously substitutes the OOV word with a similarly sounding word from its vocabulary. Furthermore, a recognition error due to an OOV word tends to spread errors into neighboring words; dramatically degrading overall recognition performance. In this thesis we propose a novel approach for handling OOV words within a single-stage recognition framework. To achieve this goal, an explicit and detailed model of OOV words is constructed and then used to augment the closed-vocabulary search space of a standard speech recognizer. This OOV model achieves open-vocabulary recognition through the use of more flexible subword units that can be concatenated during recognition to form new phone sequences corresponding to potential new words. Examples of such subword units are phones, syllables, or some automatically-learned multi-phone sequences. Subword units have the attractive property of being a closed set, and thus are able to cover any new words, and can conceivably cover most utterances with partially spoken words as well. The main challenge with such an approach is ensuring that the OOV model does not absorb portions of the speech signal corresponding to in-vocabulary (IV) words. In dealing with this challenge, we explore several research issues related to designing the subword lexicon, language model, and topology of the OOV model. We present a dictionary-based approach for estimating subword language models. Such language models are utilized within the subword search space to help recognize the underlying phonetic transcription of OOV words. We also propose a data-driven iterative bottom-up procedure for automatically creating a multi-phone subword inventory. Starting with individual phones, this procedure uses the maximum mutual information principle to successively merge phones to obtain longer subword units. The thesis also extends this OOV approach to modelling multiple classes of OOV words. Instead of augmenting the word search space with a single model, we add several models, one for each class of words. We present two approaches for designing the OOV word classes. The first approach relies on using common part-of-speech tags. The second approach is a data-driven two-step clustering procedure, where the first step uses agglomerative clustering to derive an initial class assignment, while the second step uses iterative clustering to move words from one class to another in order to reduce the model perplexity.
منابع مشابه
Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملSpeech Recognition Methods and their Potential for Dialogue Systems in Mobile Environments
The DaimlerChrysler speech recognizer is specialized for robust speech recognition in noisy environments, in particular for command and control applications. The recognizer that is used in cars has fixed grammars, which restrict the speaker to using short commands. This paper presents methods that allow the user to speak more freely and add spontaneous words to the commands: language modelling,...
متن کاملUsing Statistical Language Modelling to Identify New Vocabulary in a Grammar-Based Speech Recognition System
Spoken language recognition meets with difficulties when an unknown word is encountered. In addition to the new word being unrecognisable, its presence impacts on recognition performance on the surrounding words. The possibility is explored here of using a back-off statistical recogniser to allow recognition of out-of-vocabulary words in a grammar-based speech recognition system. This study sho...
متن کاملUsing statistical language modelling to identify new vocabulary in a grammar-based speech recognition system
Spoken language recognition meets with difficulties when an unknown word is encountered. In addition to the new word being unrecognisable, its presence impacts on recognition performance on the surrounding words. The possibility is explored here of using a back-off statistical recogniser to allow recognition of out-of-vocabulary words in a grammar-based speech recognition system. This study sho...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014